Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
NPJ Digit Med ; 7(1): 98, 2024 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-38637674

RESUMEN

Accurate prediction of recurrence and progression in non-muscle invasive bladder cancer (NMIBC) is essential to inform management and eligibility for clinical trials. Despite substantial interest in developing artificial intelligence (AI) applications in NMIBC, their clinical readiness remains unclear. This systematic review aimed to critically appraise AI studies predicting NMIBC outcomes, and to identify common methodological and reporting pitfalls. MEDLINE, EMBASE, Web of Science, and Scopus were searched from inception to February 5th, 2024 for AI studies predicting NMIBC recurrence or progression. APPRAISE-AI was used to assess methodological and reporting quality of these studies. Performance between AI and non-AI approaches included within these studies were compared. A total of 15 studies (five on recurrence, four on progression, and six on both) were included. All studies were retrospective, with a median follow-up of 71 months (IQR 32-93) and median cohort size of 125 (IQR 93-309). Most studies were low quality, with only one classified as high quality. While AI models generally outperformed non-AI approaches with respect to accuracy, c-index, sensitivity, and specificity, this margin of benefit varied with study quality (median absolute performance difference was 10 for low, 22 for moderate, and 4 for high quality studies). Common pitfalls included dataset limitations, heterogeneous outcome definitions, methodological flaws, suboptimal model evaluation, and reproducibility issues. Recommendations to address these challenges are proposed. These findings emphasise the need for collaborative efforts between urological and AI communities paired with rigorous methodologies to develop higher quality models, enabling AI to reach its potential in enhancing NMIBC care.

3.
J Pediatr Urol ; 2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38331659

RESUMEN

INTRODUCTION: Artificial intelligence (AI) and machine learning (ML) in pediatric urology is gaining increased popularity and credibility. However, the literature lacks standardization in reporting and there are areas for methodological improvement, which incurs difficulty in comparison between studies and may ultimately hurt clinical implementation of these models. The "STandardized REporting of Applications of Machine learning in UROlogy" (STREAM-URO) framework provides methodological instructions to improve transparent reporting in urology and APPRAISE-AI in a critical appraisal tool which provides quantitative measures for the quality of AI studies. The adoption of these will allow urologists and developers to ensure consistency in reporting, improve comparison, develop better models, and hopefully inspire clinical translation. METHODS: In this article, we have applied STREAM-URO framework and APPRAISE-AI tool to the pediatric hydronephrosis literature. By doing this, we aim to describe best practices on ML reporting in urology with STREAM-URO and provide readers with a critical appraisal tool for ML quality with APPRAISE-AI. By applying these to the pediatric hydronephrosis literature, we provide some tutorial for other readers to employ these in developing and appraising ML models. We also present itemized recommendations for adequate reporting, and critically appraise the quality of ML in pediatric hydronephrosis insofar. We provide examples of strong reporting and highlight areas for improvement. RESULTS: There were 8 ML models applied to pediatric hydronephrosis. The 26-item STREAM-URO framework is provided in Appendix A and 24-item APPRAISE-AI tool is provided in Appendix B. Across the 8 studies, the median compliance with STREAM-URO was 67 % and overall study quality was moderate. The highest scoring APPRAISE-AI domains in pediatric hydronephrosis were clinical relevance and reporting quality, while the worst were methodological conduct, robustness of results, and reproducibility. CONCLUSIONS: If properly conducted and reported, ML has the potential to impact the care we provide to patients in pediatric urology. While AI is exciting, the paucity of strong evidence limits our ability to translate models to practice. The first step toward this goal is adequate reporting and ensuring high quality models, and STREAM-URO and APPRAISE-AI can facilitate better reporting and critical appraisal, respectively.

4.
BJU Int ; 133(1): 79-86, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-37594786

RESUMEN

OBJECTIVE: To sensitively predict the risk of renal obstruction on diuretic renography using routine reported ultrasonography (US) findings, coupled with machine learning approaches, and determine safe criteria for deferral of diuretic renography. PATIENTS AND METHODS: Patients from two institutions with isolated hydronephrosis who underwent a diuretic renogram within 3 months following renal US were included. Age, sex, and routinely reported US findings (laterality, kidney length, anteroposterior diameter, Society for Fetal Urology [SFU] grade) were abstracted. The drainage half-times were collected from renography and stratified as low risk (<20 min, primary outcome), intermediate risk (20-60 min), and high risk of obstruction (>60 min). A random Forest model was trained to classify obstruction risk, here named the 'Artificial intelligence Evaluation of Renogram Obstruction' (AERO). Model performance was determined by measuring area under the receiver-operating-characteristic curve (AUROC) and decision curve analysis. RESULTS: A total of 304 patients met the inclusion criteria, with a median (interquartile range) age of diuretic renogram at 4 (2-7) months. Of all patients, 48 (16%) were low risk, 102 (33%) were intermediate risk, 156 (51%) were high risk of obstruction based on diuretic renogram. The AERO achieved a binary AUROC of 0.84, multi-class AUROC of 0.74 that was superior to the SFU grade, and external validation (n = 64) binary AUROC of 0.76. The most important features for prediction included age, anteroposterior diameter, and SFU grade. We deployed our application in an easy-to-use application (https://sickkidsurology.shinyapps.io/AERO/). At a threshold probability of 30%, the AERO would allow 66 more patients per 1000 to safely avoid a renogram without missing significant obstruction compared to a strategy in which a renogram is routinely performed for SFU Grade ≥3. CONCLUSIONS: Coupled with machine learning, routine US findings can improve the criteria to determine in which children with isolated hydronephrosis a diuretic renogram can be safely avoided. Further optimisation and validation are required prior to implementation into clinical practice.


Asunto(s)
Hidronefrosis , Obstrucción Ureteral , Humanos , Niño , Lactante , Inteligencia Artificial , Hidronefrosis/diagnóstico por imagen , Renografía por Radioisótopo , Ultrasonografía , Diuréticos/uso terapéutico , Aprendizaje Automático , Obstrucción Ureteral/diagnóstico por imagen , Estudios Retrospectivos
7.
JAMA Netw Open ; 6(9): e2335377, 2023 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-37747733

RESUMEN

Importance: Artificial intelligence (AI) has gained considerable attention in health care, yet concerns have been raised around appropriate methods and fairness. Current AI reporting guidelines do not provide a means of quantifying overall quality of AI research, limiting their ability to compare models addressing the same clinical question. Objective: To develop a tool (APPRAISE-AI) to evaluate the methodological and reporting quality of AI prediction models for clinical decision support. Design, Setting, and Participants: This quality improvement study evaluated AI studies in the model development, silent, and clinical trial phases using the APPRAISE-AI tool, a quantitative method for evaluating quality of AI studies across 6 domains: clinical relevance, data quality, methodological conduct, robustness of results, reporting quality, and reproducibility. These domains included 24 items with a maximum overall score of 100 points. Points were assigned to each item, with higher points indicating stronger methodological or reporting quality. The tool was applied to a systematic review on machine learning to estimate sepsis that included articles published until September 13, 2019. Data analysis was performed from September to December 2022. Main Outcomes and Measures: The primary outcomes were interrater and intrarater reliability and the correlation between APPRAISE-AI scores and expert scores, 3-year citation rate, number of Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) low risk-of-bias domains, and overall adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. Results: A total of 28 studies were included. Overall APPRAISE-AI scores ranged from 33 (low quality) to 67 (high quality). Most studies were moderate quality. The 5 lowest scoring items included source of data, sample size calculation, bias assessment, error analysis, and transparency. Overall APPRAISE-AI scores were associated with expert scores (Spearman ρ, 0.82; 95% CI, 0.64-0.91; P < .001), 3-year citation rate (Spearman ρ, 0.69; 95% CI, 0.43-0.85; P < .001), number of QUADAS-2 low risk-of-bias domains (Spearman ρ, 0.56; 95% CI, 0.24-0.77; P = .002), and adherence to the TRIPOD statement (Spearman ρ, 0.87; 95% CI, 0.73-0.94; P < .001). Intraclass correlation coefficient ranges for interrater and intrarater reliability were 0.74 to 1.00 for individual items, 0.81 to 0.99 for individual domains, and 0.91 to 0.98 for overall scores. Conclusions and Relevance: In this quality improvement study, APPRAISE-AI demonstrated strong interrater and intrarater reliability and correlated well with several study quality measures. This tool may provide a quantitative approach for investigators, reviewers, editors, and funding organizations to compare the research quality across AI studies for clinical decision support.


Asunto(s)
Inteligencia Artificial , Sistemas de Apoyo a Decisiones Clínicas , Humanos , Reproducibilidad de los Resultados , Aprendizaje Automático , Relevancia Clínica
9.
Can Urol Assoc J ; 17(11): E395-E401, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37549345

RESUMEN

INTRODUCTION: The use of artificial intelligence (AI) in urology is gaining significant traction. While previous reviews of AI applications in urology exist, there have been few attempts to synthesize existing literature on urothelial cancer (UC). METHODS: Comprehensive searches based on the concepts of "AI" and "urothelial cancer" were conducted in MEDLINE , EMBASE , Web of Science, and Scopus. Study selection and data abstraction were conducted by two independent reviewers. Two independent raters assessed study quality in a random sample of 25 studies with the prediction model risk of bias assessment tool (PROBAST) and the standardized reporting of machine learning applications in urology (STREAM-URO) framework. RESULTS: From a database search of 4581 studies, 227 were included. By area of research, 33% focused on image analysis, 26% on genomics, 16% on radiomics, and 15% on clinicopathology. Thematic content analysis identified qualitative trends in AI models employed and variables for feature extraction. Only 19% of studies compared performance of AI models to non-AI methods. All selected studies demonstrated high risk of bias for analysis and overall concern with Cohen's kappa (k)=0.68. Selected studies met 66% of STREAM-URO items, with k=0.76. CONCLUSIONS: The use of AI in UC is a topic of increasing importance; however, there is a need for improved standardized reporting, as evidenced by the high risk of bias and low methodologic quality identified in the included studies.

10.
Can Urol Assoc J ; 17(8): 243-246, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37581544

RESUMEN

INTRODUCTION: Vesicoureteral reflux (VUR) is commonly diagnosed in the workup of urinary tract infections or hydronephrosis in children. Traditionally, VUR severity is graded subjectively based on voiding cystourethrogram (VCUG) imaging. Herein, we characterized the association between age, sex, and indication for VCUG, by employing standardized quantitative features. METHODS: We included renal units with a high certainty in VUR grade (>80% consensus) from the qVUR model validation study at our institution between 2013 and 2019. We abstracted the following variables: age, sex, laterality, indication for VCUG, and qVUR parameters (tortuosity, ureter widths on VCUG). High-grade VUR was defined as grade 4 or 5 The association between each variable and VUR grade was assessed. RESULTS: A total of 443 patients (523 renal units) were included, consisting of a 48:52 male/female ratio. The median age at VCUG was 13 months. Younger age at VCUG (<6 months) was associated with greater odds of severe VUR (odds ratio [OR] 2.0), and there was a weak correlation between age and VUR grade (ρ=-0.17). Male sex was associated with increased odds of high-grade VUR (OR 2.7). VCUGs indicated for hydronephrosis were associated with high-grade VUR (OR 4.1) compared to those indicated for UTI only. Ureter tortuosity and width were significantly associated with each clinical variable and VUR severity. CONCLUSIONS: Male sex, younger age (<6 months), and history of hydronephrosis are associated with both high-grade VUR and standardized quantitative measures, including greater ureter tortuosity and increased ureteral width. This lends support to quantitative assessment to improve reliability in VUR grading.

11.
Lancet Digit Health ; 5(7): e435-e445, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37211455

RESUMEN

BACKGROUND: Accurate prediction of side-specific extraprostatic extension (ssEPE) is essential for performing nerve-sparing surgery to mitigate treatment-related side-effects such as impotence and incontinence in patients with localised prostate cancer. Artificial intelligence (AI) might provide robust and personalised ssEPE predictions to better inform nerve-sparing strategy during radical prostatectomy. We aimed to develop, externally validate, and perform an algorithmic audit of an AI-based Side-specific Extra-Prostatic Extension Risk Assessment tool (SEPERA). METHODS: Each prostatic lobe was treated as an individual case such that each patient contributed two cases to the overall cohort. SEPERA was trained on 1022 cases from a community hospital network (Trillium Health Partners; Mississauga, ON, Canada) between 2010 and 2020. Subsequently, SEPERA was externally validated on 3914 cases across three academic centres: Princess Margaret Cancer Centre (Toronto, ON, Canada) from 2008 to 2020; L'Institut Mutualiste Montsouris (Paris, France) from 2010 to 2020; and Jules Bordet Institute (Brussels, Belgium) from 2015 to 2020. Model performance was characterised by area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), calibration, and net benefit. SEPERA was compared against contemporary nomograms (ie, Sayyid nomogram, Soeterik nomogram [non-MRI and MRI]), as well as a separate logistic regression model using the same variables included in SEPERA. An algorithmic audit was performed to assess model bias and identify common patient characteristics among predictive errors. FINDINGS: Overall, 2468 patients comprising 4936 cases (ie, prostatic lobes) were included in this study. SEPERA was well calibrated and had the best performance across all validation cohorts (pooled AUROC of 0·77 [95% CI 0·75-0·78] and pooled AUPRC of 0·61 [0·58-0·63]). In patients with pathological ssEPE despite benign ipsilateral biopsies, SEPERA correctly predicted ssEPE in 72 (68%) of 106 cases compared with the other models (47 [44%] in the logistic regression model, none in the Sayyid model, 13 [12%] in the Soeterik non-MRI model, and five [5%] in the Soeterik MRI model). SEPERA had higher net benefit than the other models to predict ssEPE, enabling more patients to safely undergo nerve-sparing. In the algorithmic audit, no evidence of model bias was observed, with no significant difference in AUROC when stratified by race, biopsy year, age, biopsy type (systematic only vs systematic and MRI-targeted biopsy), biopsy location (academic vs community), and D'Amico risk group. According to the audit, the most common errors were false positives, particularly for older patients with high-risk disease. No aggressive tumours (ie, grade >2 or high-risk disease) were found among false negatives. INTERPRETATION: We demonstrated the accuracy, safety, and generalisability of using SEPERA to personalise nerve-sparing approaches during radical prostatectomy. FUNDING: None.


Asunto(s)
Inteligencia Artificial , Próstata , Masculino , Humanos , Estudios Retrospectivos , Prostatectomía , Medición de Riesgo
12.
Urol Oncol ; 41(3): 137-144, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36428167

RESUMEN

OBJECTIVE: To determine the patient characteristics and role of nephron-sparing surgery (NSS) in the treatment of children and young adults with renal cell carcinoma (RCC). METHODS: A systematic search of Embase, MEDLINE, and Scopus databases was conducted in December 2021 according to Cochrane collaboration recommendations. All included manuscripts were assessed for patient characteristics and all reported outcomes for patients undergoing partial nephrectomy (PN), and radical nephrectomy (RN) outcomes were abstracted as a comparison group. Primary outcomes included surgical outcomes, overall survival, kidney outcomes. Outcomes were pooled with weighted mean and ranges. Meta-analysis was not performed given study quality. This systematic review was prospectively registered on PROSPERO (CRD42022300261). RESULTS: We found a total of 16 studies describing 119 and 559 unique patients undergoing PN and RN, respectively, with a mean age of 12.2 years and mean follow-up of 59.1 months. The mean tumor size for patients undergoing PN was 3.5 cm. Of the 113 patients undergoing PN with available data, 109 were alive at follow-up (98%). No studies reported long-term kidney outcomes, and four studies reported surgical outcomes. All studies had at least moderate risk of bias. CONCLUSIONS: The use of NSS in children and young adults with RCC is feasible in selected patients. However, small sample sizes, confounding, and low study quality limit clinical recommendation on NSS in this population. There are significant opportunities for future research on the use of NSS in RCC, especially with systematic reporting of oncological, kidney, and surgical outcomes.


Asunto(s)
Carcinoma de Células Renales , Neoplasias Renales , Humanos , Niño , Adulto Joven , Carcinoma de Células Renales/patología , Neoplasias Renales/patología , Nefrectomía/efectos adversos , Riñón/patología , Nefronas/cirugía , Resultado del Tratamiento , Estudios Retrospectivos
13.
J Endourol ; 37(4): 474-494, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36266993

RESUMEN

Introduction: Previous systematic reviews related to machine learning (ML) in urology often overlooked the literature related to endourology. Therefore, we aim to conduct a more focused systematic review examining the use of ML algorithms for the management of benign prostatic hyperplasia (BPH) or urolithiasis. In addition, we are the first group to evaluate these articles using the Standardized Reporting of Machine Learning Applications in Urology (STREAM-URO) framework. Methods: Searches of MEDLINE, Embase, and the Cochrane CENTRAL databases were conducted from inception through July 12, 2021. Keywords included those related to ML, endourology, urolithiasis, and BPH. Two reviewers screened the citations that were eligible for title, abstract, and full-text screening, with conflicts resolved by a third reviewer. Two reviewers extracted information from the studies, with discrepancies resolved by a third reviewer. The data collected were then qualitatively synthesized by consensus. Two reviewers evaluated each article according to the STREAM-URO checklist with discrepancies resolved by a third reviewer. Results: After identifying 459 unique citations, 63 articles were retained for data extraction. Most articles consisted of tabular (n = 32) and computer vision (n = 23) tasks. The two most common problem types were classification (n = 40) and regression (n = 12). In general, most studies utilized neural networks as their ML algorithm (n = 36). Among the 63 studies retrieved, 58 were related to urolithiasis and 5 focused on BPH. The urolithiasis studies were designed for outcome prediction (n = 20), stone classification (n = 18), diagnostics (n = 17), and therapeutics (n = 3). The BPH studies were designed for outcome prediction (n = 2), diagnostics (n = 2), and therapeutics (n = 1). On average, the urolithiasis and BPH articles met 13.8 (standard deviation 2.6), and 13.4 (4.1) of the 26 STREAM-URO framework criteria, respectively. Conclusions: The majority of the retrieved studies effectively helped with outcome prediction, diagnostics, and therapeutics for both urolithiasis and BPH. While ML shows great promise in improving patient care, it is important to adhere to the recently developed STREAM-URO framework to ensure the development of high-quality ML studies.


Asunto(s)
Hiperplasia Prostática , Urolitiasis , Masculino , Humanos , Hiperplasia Prostática/diagnóstico , Urolitiasis/diagnóstico , Urolitiasis/terapia , Aprendizaje Automático
14.
Can Urol Assoc J ; 17(4): 121-128, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36486178

RESUMEN

INTRODUCTION: Penile inversion vaginoplasty (PIV) remains the gold standard technique for vaginoplasty, a gender-affirming feminizing surgery, but has been associated with urinary complications; however, there is little literature synthesizing urinary complications after PIV surgery, and there is a need to compile these complications to counsel patients pre- and postoperatively on managing surgical expectations. In this systematic review, we summarize the prevalence of urinary complications following PIV. METHODS: We searched the MEDLINE, EMBASE, CINAHL, and Scopus databases in July 2020. The primary outcome was the prevalence of urinary and surgical complications in patients after penile inversion vaginoplasty. Pooled prevalence was determined from extrapolated data. ROBINS-I tool was used to assess study quality. The study was prospectively registered on PROSPERO (CRD 42020204139). RESULTS: Of 843 unique records, 27 articles were pooled for synthesis, with 3388 patients in total. Overall patient satisfaction ranged from 80-100%. The most common urological complications included poor/splayed stream (11.7%, 95% confidence interval [CI] 5.7-19.3), meatal stenosis (6.9%, 95% CI 2.7-12.7), and irritative symptoms (frequency, urgency, nocturia) (11.5%, 95% CI 2.6-25.1). Other urinary complications included retention requiring catheterization (5.1%, 95% CI 0.3-13.8), incontinence (8.7%, 95% CI 3.4-15.6), urethral stricture (4.6%, 95% CI 1.2-9.8), and urinary tract infection (5.6%, 95% CI 2.7-9.4). Most pooled studies had moderate risk of bias. CONCLUSIONS: The available evidence suggests that there is a low prevalence of urinary complications following PIV. Overall, there is a need for standardization of data in transgender surgical care to better understand surgical outcomes and improve postoperative management.

15.
Urology ; 172: 170-173, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36450318

RESUMEN

OBJECTIVE: To determine long-term kidney outcomes in boys with posterior urethral valve (PUV) undergoing either primary valve ablation or urinary diversion with matched baseline kidney function. METHODS: After retrospective review of patients managed for PUV at our institution, propensity score matched analysis was conducted using nadir serum creatinine with logistic regression analysis. Nearest neighbor matching was used to allocate boys to primary urinary diversion and primary ablation groups. Primary outcomes included kidney function by creatinine or estimated glomerular filtration rate, chronic kidney disease, and end-stage renal disease. Comparative statistics by odds ratio (OR) and hazard ratios on survival analysis were calculated. RESULTS: A total of 21 boys undergoing primary diversion were matched with 42 boys undergoing ablation using nadir serum creatinine and follow-up time with a median follow-up of 4.8 years. After matching, there was no significant difference in last follow-up kidney function by creatinine (P = .99) or estimated glomerular filtration rate (P = .98). Primary diversion was not associated with increased likelihood of developing chronic kidney disease stage 3 (OR 1.33; P = .31) or end-stage renal disease (OR 1.88; P = .35 and hazard ratios 1.85; P = .30) compared to primary ablation. CONCLUSIONS: Our propensity matched study suggests that long-term kidney function and kidney outcomes are similar between primary ablation and primary diversion after adjusting for baseline kidney function in boys with PUV.


Asunto(s)
Fallo Renal Crónico , Insuficiencia Renal Crónica , Obstrucción Uretral , Derivación Urinaria , Masculino , Humanos , Uretra/cirugía , Creatinina , Derivación Urinaria/efectos adversos , Riñón/cirugía , Insuficiencia Renal Crónica/complicaciones , Fallo Renal Crónico/cirugía , Estudios Retrospectivos
16.
Urol Oncol ; 41(6): 284-291, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36088245

RESUMEN

Meningeal metastases (MM) are a rare progression in advanced prostate. Here we aimed to characterize the incidence, clinical presentation, and outcomes of patients with MM, including dural and leptomeningeal metastases, from primary prostate cancer. A systematic search was performed on MEDLINE, EMBASE, Scopus, and Web of Science. Studies that included patients who developed MM from primary prostate cancer were abstracted. Assessed outcomes included time from primary cancer to MM and MM to death, and clinical presentation of MM, among others. Case reports were compared qualitatively, while observational studies were pooled for quantitative synthesis. The systematic review was prospectively registered on PROSPERO (CRD42020205378). Our institutional series, 11 observational studies, and 46 case reports were synthesized, comprising a total of 191 patients. From the observational studies, the mean age at developing MM was 63.0 years (range: 58.4, 70.9). Presenting neurological symptoms were variable and largely depended on location of MM. The mean time from prostate cancer to MM was 54.6 months (range: 21.0, 101.5), and the mean time from MM to death was 9.0 months (range: 2.6, 23.0). Patients requiring resection for MM had shorter survival after disease progression compared to patients receiving radiation or supportive therapy. All articles had at least moderate risk of bias. We describe the largest synthesis of patients with progression to MM from prostate cancer. Current evidence is very low-quality and primarily stems from small observational studies. Neurological symptoms in the setting of advanced prostate cancer, especially in high-risk disease, warrants radiographic imaging for MM. Further prospective research on risk factors and treatment for MM is warranted.


Asunto(s)
Neoplasias de la Próstata , Humanos , Masculino , Próstata/patología , Neoplasias de la Próstata/patología
17.
J Urol ; 208(6): 1314-1322, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36215077

RESUMEN

PURPOSE: Vesicoureteral reflux grading from voiding cystourethrograms is highly subjective with low reliability. We aimed to demonstrate improved reliability for vesicoureteral reflux grading with simple and machine learning approaches using ureteral tortuosity and dilatation on voiding cystourethrograms. MATERIALS AND METHODS: Voiding cystourethrograms were collected from our institution for training and 5 external data sets for validation. Each voiding cystourethrogram was graded by 5-7 raters to determine a consensus vesicoureteral reflux grade label and inter- and intra-rater reliability was assessed. Each voiding cystourethrogram was assessed for 4 features: ureteral tortuosity, proximal, distal, and maximum ureteral dilatation. The labels were then assigned to the combination of the 4 features. A machine learning-based model, qVUR, was trained to predict vesicoureteral reflux grade from these features and model performance was assessed by AUROC (area under the receiver-operator-characteristic). RESULTS: A total of 1,492 kidneys and ureters were collected from voiding cystourethrograms resulting in a total of 8,230 independent gradings. The internal inter-rater reliability for vesicoureteral reflux grading was 0.44 with a median percent agreement of 0.71 and low intra-rater reliability. Higher values for each feature were associated with higher vesicoureteral reflux grade. qVUR performed with an accuracy of 0.62 (AUROC=0.84) with stable performance across all external data sets. The model improved vesicoureteral reflux grade reliability by 3.6-fold compared to traditional grading (P < .001). CONCLUSIONS: In a large pediatric population from multiple institutions, we show that machine learning-based assessment for vesicoureteral reflux improves reliability compared to current grading methods. qVUR is generalizable and robust with similar accuracy to clinicians but the added prognostic value of quantitative measures warrants further study.


Asunto(s)
Uréter , Reflujo Vesicoureteral , Niño , Humanos , Reflujo Vesicoureteral/diagnóstico por imagen , Reproducibilidad de los Resultados , Cistografía/métodos , Aprendizaje Automático , Estudios Retrospectivos
18.
Front Digit Health ; 4: 929508, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36052317

RESUMEN

As more artificial intelligence (AI) applications are integrated into healthcare, there is an urgent need for standardization and quality-control measures to ensure a safe and successful transition of these novel tools into clinical practice. We describe the role of the silent trial, which evaluates an AI model on prospective patients in real-time, while the end-users (i.e., clinicians) are blinded to predictions such that they do not influence clinical decision-making. We present our experience in evaluating a previously developed AI model to predict obstructive hydronephrosis in infants using the silent trial. Although the initial model performed poorly on the silent trial dataset (AUC 0.90 to 0.50), the model was refined by exploring issues related to dataset drift, bias, feasibility, and stakeholder attitudes. Specifically, we found a shift in distribution of age, laterality of obstructed kidneys, and change in imaging format. After correction of these issues, model performance improved and remained robust across two independent silent trial datasets (AUC 0.85-0.91). Furthermore, a gap in patient knowledge on how the AI model would be used to augment their care was identified. These concerns helped inform the patient-centered design for the user-interface of the final AI model. Overall, the silent trial serves as an essential bridge between initial model development and clinical trials assessment to evaluate the safety, reliability, and feasibility of the AI model in a minimal risk environment. Future clinical AI applications should make efforts to incorporate this important step prior to embarking on a full-scale clinical trial.

19.
Can Urol Assoc J ; 16(6): 213-221, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35099382

RESUMEN

INTRODUCTION: We aimed to develop an explainable machine learning (ML) model to predict side-specific extraprostatic extension (ssEPE) to identify patients who can safely undergo nerve-sparing radical prostatectomy using preoperative clinicopathological variables. METHODS: A retrospective sample of clinicopathological data from 900 prostatic lobes at our institution was used as the training cohort. Primary outcome was the presence of ssEPE. The baseline model for comparison had the highest performance out of current biopsy-derived predictive models for ssEPE. A separate logistic regression (LR) model was built using the same variables as the ML model. All models were externally validated using a testing cohort of 122 lobes from another institution. Models were assessed by area under receiver-operating-characteristic curve (AUROC), precision-recall curve (AUPRC), calibration, and decision curve analysis. Model predictions were explained using SHapley Additive exPlanations. This tool was deployed as a publicly available web application. RESULTS: Incidence of ssEPE in the training and testing cohorts were 30.7 and 41.8%, respectively. The ML model achieved AUROC 0.81 (LR 0.78, baseline 0.74) and AUPRC 0.69 (LR 0.64, baseline 0.59) on the training cohort. On the testing cohort, the ML model achieved AUROC 0.81 (LR 0.76, baseline 0.75) and AUPRC 0.78 (LR 0.75, baseline 0.70). The ML model was explainable, well-calibrated, and achieved the highest net benefit for clinically relevant cutoffs of 10-30%. CONCLUSIONS: We developed a user-friendly application that enables physicians without prior ML experience to assess ssEPE risk and understand factors driving these predictions to aid surgical planning and patient counselling (https://share.streamlit.io/jcckwong/ssepe/main/ssEPE_V2.py).

20.
J Pediatr Urol ; 18(1): 78.e1-78.e7, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34736872

RESUMEN

INTRODUCTION: The objectivity of vesicoureteral reflux (VUR) grading has come into question for low inter-rater reliability. Using quantitative image features to aid in VUR grading may make it more consistent. OBJECTIVE: To develop a novel quantitative approach to the assignment of VUR from voiding cystourethrograms (VCUG) alone. STUDY DESIGN: An online dataset of VCUGs was abstracted and individual renal units were graded as low-grade (I-III) or high-grade (IV-V). We developed an image analysis and machine learning workflow to automatically calculate and normalize the ureteropelvic junction (UPJ) width, ureterovesical junction (UVJ) width, maximum ureter width, and tortuosity of the ureter based on three simple user annotations. A random forest classifier was trained to distinguish between low-vs high-grade VUR. An external validation cohort was generated from the institutional imaging repository. Discriminative capability was quantified using receiver-operating-characteristic and precision-recall curve analysis. We used Shapley Additive exPlanations to interpret the model's predictions. RESULTS: 41 renal units were abstracted from an online dataset, and 44 renal units were collected from the institutional imaging repository. Significant differences observed in UVJ width, UPJ width, maximum ureter width, and tortuosity between low- and high-grade VUR. A random-forest classifier performed favourably with an accuracy of 0.83, AUROC of 0.90 and AUPRC of 0.89 on leave-one-out cross-validation, and accuracy of 0.84, AUROC of 0.88 and AUPRC of 0.89 on external validation. Tortuosity had the highest feature importance, followed by maximum ureter width, UVJ width, and UPJ width. We deployed this tool as a web-application, qVUR (quantitative VUR), where users are able to upload any VCUG for automated grading using the model generated here (https://akhondker.shinyapps.io/qVUR/). DISCUSSION: This study provides the first step towards creating an automated and more objective standard for determining the significance of VUR features. Our findings suggest that tortuosity and ureter dilatation are predictors of high-grade VUR. Moreover, this proof-of-concept model was deployed in a simple-to-use web application. CONCLUSION: Grading of VUR using quantitative metrics is possible, even in non-standardized datasets of VCUG. Machine learning methods can be applied to objectively grade VUR in the future.


Asunto(s)
Reflujo Vesicoureteral , Cistografía/métodos , Humanos , Lactante , Aprendizaje Automático , Reproducibilidad de los Resultados , Estudios Retrospectivos , Reflujo Vesicoureteral/diagnóstico por imagen
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...